Index-Based Persistent Document Identifiers
نویسندگان
چکیده
منابع مشابه
Representing Document Lengths with Identifiers
The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The techniq...
متن کاملCan Persistent Identifiers Be Cool?
The fast growth of scientific and non-scientific digital data, as well as the proliferation of new types of digital content, has led – among many other things – to a lot of innovative work on the concept of the identifier. Digital identifiers have become the key to preserving and accessing content, just as physical identifier tags have been the key to accessing paper-based content and other phy...
متن کاملImplementing Persistent Identifiers. Overview of concepts, guidelines and recommendations
Traditionally, references to web content have been made by using URL hyperlinks. However, as links are 'broken' when content is moved to another location, a reference system based on URLs is inherently unstable and poses risks for continued access to web resources. To create a more reliable system for referring to published material on the web, from the mid-1990s a number of schemes have been d...
متن کاملAssigning Document Identifiers to Enhance Compressibility of Fulltext Indices
Index compression has been a major issue in the field of Information Retrieval Systems. In particular, due to the impressive figures involved with Web Search Engines (WSEs) the compression of the index is not an option anymore but it has become a must. The most important index compression methods are designed to work for Inverted File (IF) indexes. These methods are based on the assumption that...
متن کاملPhrase-based Document Similarity Based on an Index Graph Model
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the underlying data model should be able to represent the phrases in the document as well as single terms. We present a novel data model, the Document Index Graph, which indexes web documents based on phrases, rather than sing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Retrieval
سال: 2005
ISSN: 1386-4564
DOI: 10.1023/b:inrt.0000048494.05013.6a